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Abstract — We prove achievability of the recently characterized 
quadratic Gaussian rate-distortion function (RDF) subject to the 
constraint that the distortion is uncorrelated to the source. This 
result is based on shaped dithered lattice quantization in the 
limit as the lattice dimension tends to infinity and holds for all 
positive distortions. It turns out that this uncorrelated distortion 
RDF can be realized causally. This feature, which stands in 
contrast to Shannon's RDF, is illustrated by causal transform 
coding. Moreover, we prove that by using feedback noise shaping 
the uncorrelated distortion RDF can be achieved causally and 
with memoryless entropy coding. Whilst achievability relies upon 
infinite dimensional quantizers, we prove that the rate loss 
incurred in the finite dimensional case can be upper-bounded 
by the space filling loss of the quantizer and, thus, is at most 
0.254 bit/dimension. 

I. Introduction 

Shannon's rate-distortion function R(D) for a stationary 
zero-mean Gaussian source X with memory and under the 
MSE fidelity criterion can be written in a parametric form 
(the reverse water-filling solution) [1] 
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where S'x(w) denotes the power spectral density (PSD) of X 
and the distortion PSD Sz{u>) is given by 



Sz{u) = 



6, ifSx(u)> 
Sx(w), otherwise. 



(lc) 



The water level 6 is chosen such that the distortion con- 
straint ( fTbl i is satisfied. 

It is well known that in order to achieve Shannon's RDF 
in the quadratic Gaussian case, the distortion must be inde- 
pendent of the output. This clearly implies that the distortion 
must be correlated to the source. 

Interestingly, many well known source coding schemes ac- 
tually lead, by construction, to source-uncorrelated distortions. 
In particular, this is the case when the source coder satisfies 
the following two conditions: a) The linear processing stages 
(if any) achieve perfect reconstruction (PR) in the absence of 
quantization; b) the quantization error is uncorrelated to the 
source. The first condition is typically satisfied by PR filter- 
banks [2], transform coders [3] and feedback quantizers [4]. 
The second condition is met when subnactive (and often when 
non-subnactive) dither quantizers are employed [5]. Thus, 



any PR scheme using, for example, subtractively dithered 
quantization, leads to source-uncorrelated distortions. 

An important fundamental question, which was raised by 
the authors in a recent paper [6], is: "What is the impact on 
Shannon's rate-distortion function, when we further impose the 
constraint that the end-to-end distortion must be uncorrelated 
to the input?" 

In [6], we formalized the notion of R ± (D), which is the 
quadratic rate-distortion function subject to the constraint that 
the distortion is uncorrelated to the input. For a Gaussian 
source X e R N , we defined R^iD) as [6] 

hl(X;Y), (2) 



R ± (D) = min 

Y:R[X(Y-X) T ]=0, 
■± r tr(K Y - X )<D,-i r \K Y - X \7r > 



A 



where the notation K x denotes the covariance matrix of X 
and |-| refers to the determinant. For zero mean Gaussian 
stationary sources, we showed in [6] that the above minimum 
(in the limit when N — > oo) satisfies the following equations: 
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where 

Sz(u) = l (VSx (jjj+a- VSx~W) ^SxJ^Y, Vw, (3b) 



is the PSD of the optimal distortion, which needs to be 
Gaussian. Notice that here the parameter a (akin to 9 in (fTJ) 
does not represent a "water level". Indeed, unless X is white, 
the PSD of the optimal distortion for R 1 - (D) is not white, for 
all D > 0. Q 

In the present paper we prove achievability of R ± (D) 
by constructing coding schemes based on dithered lattice 
quantization, which, in the limit as the quantizer dimension 
approaches infinity, are able to achieve R- 1 (D) for any positive 
D. We also show that R^{D) can be realized causally, i.e., that 
for all Gaussian sources and for all positive distortions one can 
build forward test channels that realize R ± (D) without using 
non-causal filters. This is contrary to the case of Shannon's 
rate distortion function R(D), where at least one of the filters 

'Other similarities and differences between R- L (D) and Shannon's R(D) 
are discussed in [6]. 



of the forward test channel that realizes R(D) needs to be 
non-causal [1]. To further illustrate the causality of R ± (D), 
we present a causal transform coding architecture that realizes 
it. We also show that the use of feedback noise-shaping allows 
one to achieve R ± (D) with memoryless entropy coding. 
This parallels a recent result by Zamir, Kochman and Erez 
for R(D) [7]. We conclude the paper by showing that, in 
all the discussed architectures, the rate-loss (with respect to 
R ± (D)) when using a finite-dimensional quantizer can be 
upper bounded by the space-filling loss of the quantizer. 
Thus, for any Gaussian source with memory, by using noise- 
shaping and scalar dithered quantization, the scalar entropy 
(conditioned to the dither) of the quantized output exceeds 
R ± (D) by at most 0.254 bit/dimension. 

II. Background on Dithered Lattice Quantization 

A randomized lattice quantizer is a lattice quantizer with 
sub tractive dither v, followed by entropy encoding. The dither 
v ^ U(Vo) is uniformly distributed over a Voronoi cell Vb of 
the lattice quantizer.Due to the dither, the quantization error 
is truly independent of the input. Furthermore, it was shown 
in [8] that the coding rate of the quantizer, i.e. 

Rq» ±jjH(Q N (X + v)\v) (4) 

can be written as the mutual information between the input and 
the output of an additive noise channel Y' = X+E', where E' 
denotes the channel's additive noise and is distributed as — v. 
More precisely, R Qn = ±I{X;Y') = ±I(X;X + E') and 
the quadratic distortion per dimension is given by -^E||Y' — 

*II 2 = ^E||£'!I 2 . 

It has furthermore been shown that when v is white there 
exists a sequence of lattice quantizers {Qn} where the quanti- 
zation error (and therefore also the dither) tends to be approxi- 
mately Gaussian distributed (in the divergence sense) for large 
N. Specifically, let E' have a probability distribution (PDF) 
Je>, and let E' G be Gaussian distributed with the same mean 
and covariance as E'. Then liiiiAr^oo jjD(f E i(e)\\f E ' G (e)) 

with a convergence rate of 1 ° S ^/ V ' ) if the sequence {Qn} is 
chosen appropriately [9]. 

In the next section we will be interested in the case where 
the dither is not necessarily white. By shaping the Voronoi 
cells of a lattice quantizer whose dither v is white, we also 
shape v, obtaining a colored dither v' . This situation was 
considered in detail in [9] from where we obtain the following 
lemma (which was proven in [9] but not put into a lemma). 

Lemma 1: Let E ~ U(Vo) be white, i.e. E is uniformly 
distributed over the Voronoi cell Vb of the lattice quantizer 
Q N and K E = el. Furthermore, let E' — U{Vq), where V ' 
denotes the shaped Voronoi cell Vq = {x € R : M~ 1 x <G Vq} 
and M is some invertible linear transformation. Denote the 
covariance of E 1 by K E > = MM T e. Similarly, let E G ~ 
jV(0, K e g ) having covariance matrix K e g = Ke and let 
E' G Af(0,K E ' G ) where K e' g = K e 1 ■ Then there exists a 
sequence of shaped lattice quantizers such that 

jfD(f E ,(e)\\f E , a (e)) = 0(\og(N)/N). (5) 



Proof: The divergence is invariant to invertible 

transformations since h(E') = h(E) + log 2 (|M|). 

Thus, D(/^(e)||/^(e)) = D(f ME (e)\\f MEa (e)) = 

D(fE(e)\\f EG (e)) for any N. ■ 

III. ACHIEVABILITY OF R^(D) 

The simplest forward channel that realizes R ± (D) is shown 
in Fig. Q] According to ||3}, all that is needed for the mutual 
information per dimension between X and Y to equal R^-{D) 
is that Z be Gaussian with PSD equal to the right hand side 
(RHS) of (|3b]i. 



X 




Y 



Fig. 1: Forward test channel 



In view of the asymptotic properties of randomized lattice 
quantizers discussed in Section QI] the achievability of R ± (D) 
can be shown by replacing the test channel of FigQ]by an ad- 
equately shaped TV-dimensional randomized lattice quantizer 
Q' N and then letting N — > oo. In order to establish this result, 
the following lemma is needed. 

Lemma 2: Let X, X', Z and Z' be mutually independent 
random vectors. Let X' and Z' be arbitrarily distributed, 
and let X and Z be Gaussian having the same mean and 
covariance as X' and Z', respectively. Then 

I(X';X' + Z') <I{X;X + Z) + D{Z'\\Z). (6) 
Proof: 

I(X'; X' + Z') = h(X' + Z') - h(Z') 

= h(X + Z)- h{Z) + D{Z'\\Z) - D(X' + Z'\\X + Z) 
< I(X;X + Z)+D{Z'\\Z), 

where (a) stems from the well known result D(X'\\X) = 
h(X) - h{X'), see, e.g., [10, p. 254]. ■ 
We can now prove the achievability of i?- L (£)). 

Theorem 1: For a source X being an infinite length Gaus- 
sian random vector with zero mean, R ± (D) is achievable. 

Proof: Let X^ N > be the sub-vector containing the first N 
elements of X. For a fixed distortion D = \x(K Z { N ))/N, the 
average mutual information per dimension jrI{X^ N '\X^ N ' + 
Z( N ^) is minimized when X( N > and Z^ N > are jointly Gaussian 
and 



see [6]. Let the A^-dimensional shaped randomized lattice 
quantizer Q' N be such that the dither is distributed as 
-E' (N) - W(V '), with K E , m = K Z(N) . It follows 
that the coding rate of the quantizer is given by Rq n = 
+ E' {N) ). The rate loss due to using Q N 



to quantize is given by 

R Qn (D) - R ± (D) 



(a) 



< ^^(/ B 'w(e)||/^ W (e)), (8) 

where / B , <«> is the PDF of the Gaussian random vector 

Eq N \ independent of E'^ N ' and X^ N \ and having the same 
first and second order statistics as £/ . In (|8), inequality (o) 
follows directly from Lemma [2] since the use of subtractive 
dither yields the error E^ N ^ independent of X^ N \ 

To complete the proof, we invoke Lemma [T] which guaran- 
tees that the RHS of (O vanishes as N — ► oo. ■ 
Remark 1: 1) For zero mean stationary Gaussian ran- 
dom sources, R ± (D) is achieved by taking X in The- 
orem Q] to be the complete input process. For this case, 
as shown in [6], the Fourier transform of the autocorre- 
lation function of Z (JV) tends to the RHS of d3bl . 

2) For vector processes, the achievability of R ± (D) fol- 
lows by building X in Theorem[T]from the concatenation 
of infinitely many consecutive vectors. 

3) Note that if one has an infinite number of parallel scalar 
random processes, R 1 - (D) can be achieved causally by 
forming X in Theorem Q] from the fc-th sample of each 
of the processes and using entropy coding after Q. 

The fact that R ± (D) can be realized causally is further 
illustrated in the following section. 

IV. Realization of R ± (D) by Causal Transform 
Coding 

We will next show that for a Gaussian random vector 
X G M. N with positive definite covariance matrix Kx, R^{D) 
can be realized by causal transform coding [11], [12]. A 
typical transform coding architecture is shown in Fig. [2] In 
this figure, T is an N x N matrix, and W is a Gaussian vector, 
independent of X, with covariance matrix K\y = Oy^I. The 
system clearly satisfies the perfect reconstruction condition 
Y = X + T W. The reconstruction error is the Gaussian 
random vector Z = Y - X, and the MSE is D = ^tr{K z }, 
where K z = OyyT~ 1 T~ T . 
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Fig. 2: Transform coder. 

By restricting T to be lower triangular, the transform coder 
in Fig. |2] becomes causal, in the sense that Vfc G {1, .-,N}, 
the fc-th elements of U and U can be determined using just 
the first fc elements of X and the fc-th element of W. 

To have jjI(X; Y) — R ± (D), it is necessary and sufficient 
that 



rj-i — 1 rj-i — T 



K Z */<J 



2 

W- 



(9) 



where the covariance matrix of the optimal distortion is [6] 

1 /Z7o 77^ 1 
2 



Kz* = rA/jf 2 



X 



olK x - 2 Kx - 



(10) 



Since T _1 is lower triangular, (O is the Cholesky decompo- 
sition of Kz* I^W' which always exists^ Thus, R ± {D) can 
be realized by causal transform coding. 

In practice, transform coders are implemented by replacing 
the (vector) AWGN channel U = V + W by a quantizer 
(or several quantizers) followed by entropy coding. The latter 
process is simplified if the quantized outputs are independent. 
When using quantizers with subtractive dither, this can be 
shown to be equivalent to having -^X^fcli^X^fc - \ Uk) — 
jfI(U;U) in the transform coder when using the AWGN 
channel. Notice that, since T in (O is invertible, the mutual 
information per dimension -kl(U; U) is also equal to R ± (D). 
By the chain rule of mutual information we have 



N ^ 



I{U k - W k ; U k ) > ^I(U; U) = R ± (D), 



(ID 



with equality iff the elements of U are mutually independent. 
If U is Gaussian, this is equivalent to being diagonal. 
Clearly, this cannot be obtained with the architecture shown 
in Fig. |2] using causal matrices (while at the same time 
satisfying (0). However, it can be achieved by using error 
feedback, as we show next. 

Consider the scheme shown in Fig. [3j where A G M ArxAr 
is lower triangular and F G R NxN is strictly lower trian- 
gular. Again, a sufficient and necessary condition to have 



*0 




Fig. 3: A causal transform coding scheme with error feedback. 

jjI{X;Y) = R ± (D) is that K z = K z <, see <[TD), i.e., 

a 2 w A-\l - F) [A-\I -F)] T = K z * 

{I~F){I~F) T = AK z *A T /a 2 w . (12) 

On the other hand, equality in (fTTT > is achieved only if 

K = AK X A T + <t 2 w (I-F)(I-F) t = D, (13) 

for some diagonal matrix D with positive elements. If we 
substitute the Cholesky factorization Kz* = LL T into ( TT2b , 
we obtain (I - F)(I - F) T = ALL T A T /a 2 v , and thus 



A = a w (I -F)L~ 



(14) 



furthermore, since Kz* > 0, there exists a unique T having only positive 
elements on its main diagonal that satisfies {5J, see [13]. 



Substituting the above into ( f]~3T > we obtain 



D 



x- 1 



- -T 



{I-Ff 



(15) 



Thus, there exis{| A and F satisfying ( TT2b and ( fT3l l. Substi- 
tution of (O into dT3J yields Z> = A (X x + K Z *)A T , and 
log | D | = 21og|A| + log\K x + K z *\. From O and the 
fact that II - F| = 1 it follows that \A\ 2 = er^/ |, and 
therefore! 



i \ " 



K z *\-^log\K z > 



= 27v !og ; 



'fe=i 



(16) 



thus achieving equality in ( fTTT i. 

We have seen that the use of error feedback allows one to 
make the average scalar mutual information between the input 
and output of each AWGN channel in the transform domain 
equal to R L {D). In the following section we show how this 
result can be extended to stationary Gaussian processes. 

V. Achieving R ± {D) by Noise Shaping 

In this section we show that, for any colored stationary 
Gaussian stationary source and for any positive distortion, 
R J ~(D) can be realized by noise shaping, and that i? J -(D) 
is achievable using memory-less entropy coding. 

A. Realization of R^-^D) by Noise-Shaping 

The fact that R ± (D) can be realized by the additive colored 
Gaussian noise test channel of Fig. [TJ suggests that R i ~(D) 
could also be achieved by an additive white Gaussian noise 
(AWGN) channel embedded in a noise-shaping feedback loop, 
see Fig. |4] In this figure, {Xk} is a Gaussian stationary process 
with PSD S x (e juJ ). The filters A{z) and F{z) are LTI. The 
AWGN channel is situated between V and U, where white 
Gaussian noise {Wfc}, independent of {Xk}, is added. The 
reconstructed signal Y is obtained by passing U through the 
filter A(z)~ 1 , yielding the reconstruction error Z k = Y k — X k . 



X 




Y 



Fig. 4: Test channel built by embedding the AWGN channel 
Uk = Vk + Wk in a noise feedback loop. 

The following theorem states that, for this scheme, the 
scalar mutual information across the AWGN channel can 
actually equal R ± (D = a z ). 

3 For any positive definite matrices Kx and K-z* = LL T , there exists a 
unique matrix F having zeros on its main diagonal that satisfies )15t . see [14]. 

4 The last equality in )16t follows from the expression for R ± (D) for 
Gaussian vector sources derived in [6]. 



Theorem 2: Consider the scheme in Fig. |4] Let {X k }, 
{Wk} be independent stationary Gaussian random processes. 
Suppose that the differential entropy rate of {Xk} is bounded, 
and that {Wk} is white. Then, for every D > 0, there exist 
causal and stable filters A{z), A(z)~ 1 and F(z) such that 



I(V k ;U k ) = R ± (D), where D 



(17) 



Proof: Consider all possible choices of the filters A(z) 
and F(z) such that the obtained sequence {Uk} is white, i.e., 



such that Sfj(e- 



4, e [- 



From Fig. [4] this is 



achieved iff the filters A(z) and F{z) satisfy 



4 



\A(e jw )\ S x (z ju ) + |l -F(e?")\ a 



w 



(18) 



On the other hand, since {Wfc} is Gaussian, a necessary and 
sufficient condition in order to achieve R i ~(D) is that 



S z (e> u )= \1-F(6>")\' \A(e>»)\ o' w (19) 

^(V^RT^-vWjvW (20) 

4^(e^), Vwe[-7r,7r]. (21) 



This holds iff U(e^)| 2 = cr^ 1 1 - F(e^) | 2 /S z * (e JW ) 



Substituting the latter and d2TT i into ( TT8l , and after some 
algebra, we obtain 



1- 



-F(e JaJ )| = 



\A{e 3 



'w 



2a± 



y/Sx{ei u ) + a' -y/Sx(ei u )' 



(22a) 



(22b) 



ay/SxV")' 

Notice that the functions on the right hand sides of (l22b 
are bounded and positive for all u> E [— tt,tt], and that 
a bounded differential entropy rate of {Xk} implies that 
| J Sx(e-' UJ )duj\ < oo. From the Paley-Wiener criterion [15] 
(see also, e.g., [16]), this implies that (1 — F(z)), A(z) and 
A{z)^ 1 can be chosen to be stable and causal. Furthermore, 
recall that for any fixed D > 0, the corresponding value of a 
is unique (see [6]), and thus fixed. Since the variance erfp is 
also fixed, it follows that each frequency response magnitude 
|l — ^(e- 7 ") that satisfies ( I22at can be associated to a unique 



value of cr?. Since F(z) is strictly causal and stable, the 
minimum value of the variance of- is achieved when 



iog\l -F(t? u )\du 



0, 



(23) 



i.e., if 1 — F(z) has no zeros outside the unit circle (equiva- 
lently, if 1 — F(z) is minimum phase), see, e.g., [17]. If we 
choose in ( I22ab a filter F(z) that satisfies ( l23b . and then we 
take the logarithm and integrate both sides of ( 122al i. we obtain 



y/Sx(&»)+a-y/S x (&»y 

duj = R ± (D). 



where < f3ab has been used. We then have that 

2 

R X (D) = - log (^f ) = l - log(27re4) - ± log^ea^) 

( =' h(U k ) - h(V K + W k \V k ) = I(V k ; U k ), 

where (a) follows from the Gaussianity of W k and U k , and 
(b) from the fact that W k is independent of V k (since F is 
strictly causal). This completes the proof. Alternatively, 

R^(D)< I({X k };{Y k }) 

= hiA-^Uk}) - h({X k } + A-\l - F){W k }\{X k }) 
= h{A- l {U k }) - h(A~\l - F){W k }) 

®h({U k })-h((l-F){W k }) 

< h(U k \U k ) - h(W k ) < h(U k ) - h(W k ) 

^ h(U k ) - h(V K + W k \V k ) = I(V k ; U k ), 

In (a), equality is achieved iff the right hand side of (fT~9t 
equals J22ab , i.e., if Z has the optimal PSD. Equality (b) 
holds because \f* log | A^e^) | < oo, which follows 
from (I22bl i. The fact that {U k } is stationary has been used 
in (c), wherein equality is achieved iff |1 — F\ is minimum 
phase, i.e., if (|23l l holds. Equality in (d) holds if an only 
if the elements of {U k } are independent, which, from the 
Gaussianity of {U k }, is equivalent to ( TT~8T >. Finally, (e) stems 
from the fact that W k is independent of V k . ■ 
Notice that the key to the proof of Theorem [2] relies on 
knowing a priori the PSD of the end to end distortion required 
to realize R ± (D). Indeed, one could also use this fact to 
realize R X (D) by embedding the AWGN in a DPCM feedback 
loop, and then following a reasoning similar to that in [7]. 

B. Achieving R- L (D) Through Feedback Quantization 

In order to achieve R L (D) by using a quantizer instead of 
an AWGN channel, one would require the quantization errors 
to be Gaussian. This cannot be achieved with scalar quantizers. 
However, as we have seen in [II] dithered lattice quantizers are 
able to yield quantization errors approximately Gaussian as 
the lattice dimension tends to infinity. The sequential (causal) 
nature of the feedback architecture does not immediately 
allow for the possibility of using vector quantizers. However, 
if several sources are to be processed simultaneously, we 
can overcome this difficulty by using an idea suggested 
in [7] where the sources are processed in parallel by separate 
feedback quantizers. The feedback quantizers are operating 
independently of each other except that their scalar quantizers 
are replaced by a single vector quantizer. If the number of 
parallel sources is large, then the vector quantizer guarantees 
that the marginal distributions of the individual components 
of the quantized vectors becomes approximately Gaussian dis- 
tributed. Thus, due to the dithering within the vector quantizer, 
each feedback quantizer observes a sequence of i.i.d. Gaussian 



quantization noises. Furthermore, the effective coding rate (per 
source) is that of a high dimensional entropy constrained 
dithered quantizer (per dimension). 

The fact that the scalar mutual information between V k and 
U k equals the mutual information rate between {Vk} and 
{U k } in each of the parallel coders implies that R J -(D) can 
be achieved by using a memoryless entropy coder. 

VI. Rate Loss with Dithered Feedback 
Quantization 

The results presented in sections ITVl and IVl suggest that if a 
test channel embedding an AWGN channel realizes R ± (D), 
then a source coder obtained by replacing the AWGN channel 
by a dithered, finite dimensional lattice quantizer, would 
exhibit a rate close to R ± (D). 

The next theorem, whose proof follows the line of the results 
given in [7, sec. VII], provides an upper bound on the rate-loss 
incurred in this case. 

Theorem 3: Consider a source coder with a finite di- 
mensional subtractively dithered lattice quantizer Q. If when 
replacing the quantizer by an AWGN channel the scalar mutual 
information across the channel equals i? _L (D), then the scalar 
entropy of the quantized output exceeds R 1 - (D) by at most 
0.254 bit/dimension. 

Proof: Let W be the noise of the AWGN channel, and 

V and U denote the channel input and output signals. From 
the conditions of the theorem, we have that 

I(V k ;U k ) = R ± (D). (24) 

If we now replace the AWGN by a dithered quantizer with 
subtractive dither v, such that the quantization noise W is 
obtained with the same first and second order statistics as W, 
then the end to end MSE remains the same. The corresponding 
signals in the quantized case, namely V' and U', will also have 
the same second order statistics as their Gaussian counterparts 

V and U. Thus, by using Lemma [2] we obtain 

I{Vl-,U' k )<R^{D) + D{U' k \\U k ). (25) 

Finally, from [8, Theorem 1], we have that H(Q(V k + 
u k )\v k ) = I(yi\U' k ). Substitution of (|25]l into this last 
equation yields the result. ■ 

VII. Conclusions 

We have proved the achievability of R ± (D) by using 
lattice quantization with subtractive dither. We have shown 
that R ± (D) can be realized causally, and that the use of 
feedback allows one to achieve R J -(D) by using memoryless 
entropy coding. We also showed that the scalar entropy of 
the quantized output when using optimal finite-dimensional 
dithered lattice quantization exceeds R J -(D) by at most 0.254 
bits/dimension. 
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